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Vh replacement refers to RAG-mediated secondary recombination of the IgH genes, which 
renews almost the entire Vh gene coding region but retains a short stretch of nucleotides as 
a Vh replacement footprint at the newly generated Vh-Dh junction. To explore the biological 
significance of V H replacement to the antibody repertoire, we developed a Java-based V H 
replacement footprint analyzer program and analyzed the distribution of Vh replacement 
products in 61,851 human IgH gene sequences downloaded from the NCBI database. 
The initial assignment of the Vh, Dh, and Jh gene segments provided a comprehensive 
view of the human IgH repertoire. To our interest, the overall frequency ofVn replacement 
products is 12.1 %; the frequencies of Vh replacement products in IgH genes using differ- 
ent Vh germline genes vary significantly. Importantly, the frequencies of Vh replacement 
products are significantly elevated in IgH genes derived from different autoimmune dis- 
eases, including rheumatoid arthritis, systemic lupus erythematosus, and allergic rhinitis, 
and in IgH genes encoding various autoantibodies or anti-viral antibodies. The identified 
Vh replacement footprints preferentially encoded charged amino acids to elongate IgH 
CDR3 regions, which may contribute to their autoreactivities or anti-viral functions. Analy- 
ses of the mutation status of the identified Vh replacement products suggested that they 
had been actively involved in immune responses. These results provide a global view of 
the distribution of Vh replacement products in human IgH genes, especially in IgH genes 
derived from autoimmune diseases and anti-viral immune responses. 
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INTRODUCTION 

To protect our body from various infectious agents, the adap- 
tive immune system has evolved the capability to generate a vast 
number of antibody (Ab) specificities through somatic rearrange- 
ment of previously separated variable (V), diversity (D) (for heavy 
chain only), and joining (J) gene segments to form the variable 
domain exons of immunoglobulin genes (1-3). V(D)J recombi- 
nation is catalyzed by a pair of recombination activating gene 
products (RAG1 andRAG2) (4-6). Specific joining of the V,D, and 
J gene segments is directed by the recombination signal sequences 
(RSS) flanking each rearranging gene segment (7). The RSS is 
composed of a highly conserved heptamer (5'-CACTGTG-3') and 
a nonamer (5'-ACAAAAACC-3') separated by a non-conserved 
spacer region with either 12 or 23 bp in length (7-9). There are 



Abbreviations: aa, amino acid; cRSS, cryptic recombination signal sequence; EBV, 
Epstein-Barr virus; HBV, hepatitis virus B; HCV, hepatitis virus C; HIV, human 
immunodeficiency virus; RA, rheumatoid arthritis; RA, rheumatoid arthritis; 
RAG, recombination activating gene products; SLE, systemic lupus erythematosus; 
VhRFA, Vh replacement footprint analyzer. 



44 functional Vh genes, 27 Dh genes, and 6 Jh genes within the 
human IgH locus. The diversified IgH repertoire is generated at 
different levels, including the random recombination of V, D, and J 
genes segments, imprecise processing of the coding-ends, addition 
of non-template nucleotides by terminal deoxynucleotidyl trans- 
ferase (TdT), random pairing of IgH with IgK or Igk light chains, 
and later through somatic hypermutation and class switch recom- 
bination during antigen dependent germinal center reaction (2). 
Previous analyses of the IgH repertoire have provided important 
information regarding the developmental process and function of 
B lineage cells (10, 1 1). For examples, earlier studies on the expres- 
sion and rearrangement status of IgH genes demonstrated that 
IgH gene are rearranged sequentially during early B lineage cell 
development, in which Dh to Jh rearrangements occurs prior to 
Vh to DJh rearrangements followed by rearrangement of the IgK 
and then Igk light chain genes (12, 13). Analyses of the Ig gene 
repertoires of different autoimmune diseases such as rheumatoid 
arthritis (RA) and systemic lupus erythematosus (SLE) revealed 
skewed usages of specific germline Vh genes (14-16), unusually 
long CDR3 regions within the IgH and IgL genes (17, 18), and 
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accumulation of somatic hypermutation in the variable regions of 
IgH and IgL genes (15, 19). 

The random process of V(D)J recombination is essential for 
generating a diverse IgH repertoire, however, it also produces non- 
functional IgH genes or IgH genes encoding autoreactive antigen 
receptors (2, 20). Early B lineage cells carrying non-functional IgH 
rearrangements must re-initiate the V(D) J recombination process 
to generate functional B-cell receptors (BCRs) for subsequent 
development; on the other hand, B-cells expressing autoreactive 
receptors will be removed from the repertoire through receptor 
editing, clonal deletion, or anergy to establish central tolerance 
(1, 21, 22). Receptor editing refers to RAG-mediated secondary 
recombination of previously rearranged IgH or IgL genes (1, 21, 
22). The organizations of the IgK and IgX loci allow continuous 
secondary recombination by joining an upstream Vl gene with a 
downstream Jl gene segment. The previously formed VlJl joints 
are deleted during secondary recombination leaving no trace in 
the newly formed VlJl junctions; the only indication of extensive 
light chain gene editing is the elevated usage of the 3' Jk or ]X genes 
and the deletion of the IgK locus (23, 24). 

The unwanted IgH genes can also be changed through a RAG- 
mediated Vh replacement process using the cryptic recombina- 
tion signal sequences (cRSSs) embedded within the framework- 3 
regions of previously rearranged Vh genes (21, 22, 25). The con- 
cept of Vh replacement was originally proposed to explain the 
observation that functional IgH genes were generated in mouse 
pre-B-cell leukemia lines initially harboring non-functional IgH 
rearrangements (26-28). Comparison of the functional IgH genes 
versus the non-functional IgH rearrangements suggested a Vh to 
VhDJh recombination process mediated by the cRSS sites (26, 
27). Subsequently, the occurrence of Vh replacement had been 
demonstrated in mouse models carrying knocked-in IgH genes 
encoding anti-DNA Abs, anti-NP Abs, or non-functional IgH 
genes in both alleles (29-34). Despite these findings, the natural 
occurrence of Vh replacement during early B-cell development in 
mouse remains to be determined (35, 36). 

Ongoing Vh replacement in human B-cells had been found in 
a human leukemia cell line, EU12, by detection of RAG-mediated 
cRSS double stranded DNA breaks (DSBs) and by amplification 
of different Vh replacement excision circles (37). The detection of 
DSBs at the Vh3-cRSS borders in human bone marrow imma- 
ture B-cells provided the first evidence for the natural occur- 
rence of Vh replacement during normal B-cell development in 
humans (37). The occurrence of Vh replacement in bone marrow 
immature B-cells is consistent with the observation that RAG1 
and RAG2 genes can be reinduced in these cells to catalyze IgL 
gene editing (24, 38, 39). Our recent studies showed that Vh 
replacement occurs in the newly immigrated immature B-cells 
in the peripheral blood of healthy donors, which can be fur- 
ther induced through BCR-mediated signaling in Ref. (40). The 
cRSS-mediated Vh replacement was of particular interest because 
the cRSS motifs are found in 40 out of 44 human Vh germline 
genes and in the majority of mouse Vh germline genes (22, 41). 
Vh replacement renews almost the entire Vh gene coding region 
but retains a short stretch of nucleotides as a Vh replacement 
footprint at the Vh-Dh junction (37). Such footprints can be 
used to identify Vh replacement products through analysis of 



IgH gene sequences. The initial analyses of 417 human IgH gene 
sequences estimated that Vh replacement products contribute 
to about 5% of the normal IgH repertoire (37). Interestingly, 
analyses of the amino acids encoded by the Vh replacement 
footprints revealed that these footprints preferentially contribute 
charged amino acids into the IgH CDR3 regions, which is dif- 
ferent from the low frequency of charged amino acids encoded 
by human germline Dh genes or N region sequences added by 
TdT(37). 

To explore the biological significance of Vh replacement, we 
developed a Java-based computer program and analyzed 61,851 
human IgH gene sequences from the NCBI database to determine 
the distribution of Vh replacement products. 

MATERIALS AND METHODS 

DEVELOPMENT OF THE V H REPLACEMENT FOOTPRINT ANALYZER 
PROGRAM 

The Vh replacement footprint analyzer (VhRFA) program was 
developed using the NetBeans 7.01 IDE with Java development 
kit (JDK) and tested under Windows, Mac OS X, and Ubuntu 
Linux (42). The reference human Vh germline gene sequences 
were downloaded from the IMGT database to generate the library 
of Vh replacement footprints with different lengths. For the initial 
test of the VhRFA program, we used 417 IgH sequences that had 
been analyzed in our previous study to manually identify potential 
Vh replacement products (37, 43). The 61,851 human IgH gene 
sequences were downloaded from the NCBI database on April 
20,2011. 

ANALYSIS OF IgH GENE SEQUENCES AND IDENTIFICATION OF 
POTENTIAL V H REPLACEMENT PRODUCTS USING THE VhRFA 
PROGRAM 

The IgH gene sequence files from NCBI database were first con- 
verted into FASTA files and uploaded to the VhRFA program. 
The Vh, Dh, and Jh germline gene usages were assigned by auto- 
matic submission of sequences in batches to the IMGT/V-Quest 
program (http://www.imgt.org/IMGT_vquest/share/textes/) (44) 
and the results were exported as Microsoft Excel files to a local 
computer. Identical IgH gene sequences in the original NCBI data- 
base were removed based on their Vh-Dh-Jh junctions and the 
remaining 39,438 unique human IgH gene sequences with iden- 
tifiable Vh, Dh, and Jh genes were further analyzed to identify 
potential Vh replacement products and calculate the frequencies 
of Vh replacement products in subsequent analyses. Briefly, the 
IgH gene sequences with clear identifiable Vh, Dh, and Jh genes 
were analyzed to identify Vh replacement footprints with 7, 6, 5, 
4, and 3-mer Vh replacement footprint motifs at their Vh-Dh 
junction (Nl) regions and Dh-Jh junction (N2) regions. The fre- 
quency of Vh replacement products was calculated by dividing 
the number of IgH genes with Vh replacement footprints in the 
Nl regions with the total number of unique IgH gene sequences. 
IgH genes with 7, 6, 4, and 3-mer Vh replacement footprint motifs 
within their Nl regions were also analyzed and discussed. The 
positive prediction value with 95% confidence interval using the 
6, 5, 4, and 3-mer Vh replacement footprint motifs to assign Vh 
replacement products are 68, 59, 54, and 52%, respectively. In the 
following comparison, the Vh replacement products mainly refer 
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to IgH genes with 5-mer Vh replacement footprint within their 
Nl regions. 

The distribution of Vh replacement products in IgH genes 
derived from different keyword sub-categories were analyzed 
based on the information linked to each sequence in the NCBI 
GenBank files. The frequencies of Vh replacement products with 
pentameric footprints were used for all these comparisons. For 
mutational analysis the IgH gene sequences had a minimum of 
>80% nucleotide similarity to the assigned germline Vh gene 
sequences. 

STATISTICAL ANALYSIS 

Statistical significance was determined by using either the two- 
tailed Chi square test with Yates' correction or the unpaired (--test. 
p < 0.05 is considered statistically significant and p < 0.0001 is 
considered extremely statistically significant. 

RESULTS 

DIFFERENTIAL USAGE OF GERMLINE V H , D H . AND J H GENES IN HUMAN 
IgH GENE SEQUENCES 

We have developed a Java-based VhRFA computer program to 
analyze large number of IgH gene sequences and to identify poten- 
tial Vh replacement products (42). In the current study, the 61,851 
human IgH gene sequences were downloaded from the NCBI 
database. The initial analysis showed that 54,970 IgH genes have 



identifiable Vh, Dh, and Jh gene segments. After removal of dupli- 
cate IgH sequences, the remaining 39,438 unique IgH genes with 
identifiable Vh, Jh> and Dh genes were further analyzed. The 
usages of the Vh, Jh> and Dh germline genes in these sequences 
represent a combinatorial view of the human IgH repertoire from 
many studies (Figure 1). The usages of all the 44 functional human 
germline Vh genes were confirmed in this dataset (Figure 1A); the 
frequencies of individual Vh germline gene usage varied consider- 
ably. For different families of Vh genes, the Vh 3 family of genes was 
predominantly utilized, followed by the Vh4 and Vh 1 families of 
genes (Figure 1A). Such results are consistent with previous analy- 
ses of small groups of IgH gene sequences, Among individual Vh 
genes, the VH3-23 gene was used the most frequently in 9536 IgH 
genes (25%). The Vh4-28 gene was used less frequently, which was 
only found in 13 IgH rearrangements (0.03%). The differential 
usages of individual Vh germline genes did not seem to corre- 
late with their relative location within the IgH locus (Figure 1A). 
Within the IgH locus, the Vhi-24, Vh2-26> and VH3-30 genes are 
located very close to the VH3-23 and Vh4-28 genes. However, the 
frequency of the VH3-23 gene usage is only 4-fold higher than those 
of the Vh3-3o gene, but is 50- and 80-fold higher than that of the 
Vhi-24 and Vh2-26 genes, respectively (Figure 1A). 

Among different Dh genes, the Dh3 gene family was predom- 
inantly used in 35% of IgH genes, in which the Dh3-io> Dh3-3, 
and Dh3-22 genes were used frequently; The Dhi gene family 
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FIGURE 1 |The comprehensive analysis of human IgH repertoire. The usage using the IMGT/V-Quest program and the identical sequences were 

61 ,851 human IgH gene sequences were downloaded from the NCBI database removed. The frequencies of V H (A), D H (B), and J H (C) germline gene usages 
on May, 2012. The sequences were first analyzed for their V H , D H , and J H gene in the 39,438 unique human IgH gene sequences were shown. 
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was used less frequently (Figure IB). Among Jh germline genes, 
the Jh4 gene was predominantly used followed by the Jh6 gene 
(Figure 1C). These results are consistent with previous individual 
reports with small number of IgH sequences. Taken together, this 
analysis provides a comprehensive view of the existing human IgH 
gene sequences in the NCBI database. 

IDENTIFICATION OF V H REPLACEMENT PRODUCTS USING THE V H RFA 
PROGRAM 

To identify potential Vh replacement products in a large num- 
ber of IgH gene sequences, the VhRFA program first generated 
libraries of potential Vh replacement footprint database with dif- 
ferent length based on the Vh gene 3' ending sequences following 
the conserved cRSS sites of all the functional human Vh germline 
genes (Tables SI and S2 in Supplementary Material). Then, the 
VhRFA program uses these libraries to search for the presence of 
Vh replacement footprint motifs with specified lengths at the Vh- 
Dh junction (Nl) regions or the Dh-Jh junction (N2) regions 
of IgH genes. As an initial test of the newly developed VhRFA 
program, we reanalyzed the 417 human IgH gene sequences that 
had been to manually identify potential Vh replacement products 
analyzed in a previous study (37). The VhRFA program efficiently 
identified Vh replacement footprint motifs with 3, 4, 5, 6, or 7 
nucleotides in both the Nl and N2 regions (Table 1, top). The fre- 
quencies of Vh replacement footprint motifs with 3, 4, or 5-mer in 
the Nl regions are significantly higher than those in the N2 regions 
(Table 1, top), indicating that the addition of such motifs in the 
Nl region is not a random event. Based on the identification of 
5-mer Vh replacement footprints, 7.3% of the IgH gene sequences 



can be assigned as potential Vh replacement products. Further 
review of these IgH genes confirmed the identified pentameric 
Vh replacement motifs within the Vh-Dh junctions (Table 2, Nl 
regions). If we consider the 4- or 3-mer Vh replacement footprints 
within the Nl regions, 25 or 54.7% of IgH genes can be assigned 
as potential Vh replacement products, respectively (Table 1; Table 
S3 in Supplementary Material). These results are consistent with 
our previously manual assignment of Vh replacement products 
in this group of IgH genes and provide the first validation of the 
VhRFA program. 

CONTRIBUTION OF V H REPLACEMENT PRODUCTS TO THE HUMAN IgH 
REPERTOIRE 

With the help of the VhRFA program, we searched for poten- 
tial Vh replacement products in the 39,438 unique human IgH 
sequences with identifiable Vh, Dh, and Jh genes from the NCBI 
database. We first compared the frequencies of Vh replacement 
footprint motifs with 3, 4, 5, 6, or 7 nucleotides within the Nl 
and N2 regions (Table 1, bottom). The frequencies of 3, 4, 5, 6, 
and 7-mer Vh replacement footprint motifs in the Nl regions 
are extremely statistically significantly higher than those in the N2 
regions (Table 1 , bottom, p < 0.000 1 ) , indicating that the presence 
of such motifs at the N 1 region is likely contributed by Vh replace- 
ment rather than random nucleotide addition. Among these IgH 
gene sequences, 12.1% of them contain the 5-mer Vh replacement 
footprint motifs and can be assigned as potential Vh replacement 
products (Table 1, bottom). This number indicates a significant 
contribution of Vh replacement products to the diversification 
of the human IgH repertoire. If we consider the 4- and 3-mer Vh 



Table 1 | Frequencies of Vh replacement footprint motifs in the N1 and N2 regions of human IgH genes. 
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13,365 
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0.0001 
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813 


0.0001 
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7 
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'Unique IgH gene sequences with identifiable V H , D H , and J H genes were analyzed. These IgH sequences contain both functional and non-functional IgH 
rearrangements. Nl, V H -D H junction regions; N2, D H -J H junction regions. 

"The frequencies of V H replacement footprint motifs with different length within the Nl or the N2 regions were compared by two-tailed Chi square with Yates' 
correction. p<0.05 is considered statistically significant and p<0. 0001 is considered extremely statistically significant. 

c The frequency of V H replacement products was calculated using the number of sequences with V H replacement motifs with different length in the Nl regions divided 
by the total number of unique IgH gene sequences. 

'These IgH gene sequences had been analyzed manually for V H replacement products (37). 

"All the three 6-mer footprints within the N2 regions could be due to second D H gene segments. 

'Not applicable. 

"The human IgH gene sequence dataset was downloaded from the NCBI database on April 20, 2011. 
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Table 2 | Identification of potential Vh replacement products in human IgH sequences. 



Accession No. 


Vh gene 


V H 


p 


N1 a 


P 


D H 


CDR3 (aa) b 


AF235818 


VH1-69*06 


tgtgcgaga 




gaagcaaagtttgagaag 




gctgccaaacc 


AREAKFEKAAKPYYYYGMDV 


AF235903 


VH3-33*01 


tgtgcgagaga 




cagac 




agctgctgctgg 


ARDRQLLLGYGMDV 


AF235823 


VH3-11*01 


tgtgcgagaga 




caccctcacgaaatcacc 




ttacgatttttggagtggttattat 


ARDTLTKSPYDFWSGYYG LTYYYYGM DV 


AF235857 


VH3-23*01 


tgtgcgaaaga 


t 


gaagaggag 




tattgtggtagaaccagctgct 


AKDEEEYCG RTSCFCM DV 


AF235601 


VH1-18*01 


tgtgcgagaga 




cgacggacgggcggcgg 




attgtagtggtggtagctgctactcc 


ARDDGRAADCSGGSCYSDY 


AF235609 


VH3-33*05 


tgtgcgaga 




agagggccaatcc 




atatcagcagctgg 


ARRGPIHISSWYYYYYGMDV 


AF235766 


VH3-30*03 


tgtgcga 




aacagtggacgc 




atattgtgg 


AKQWTHIVVFDI 


AF235806 


VH3-15*01 


tgt 




cattcggggggtagacc 




gtatagcagtggctggt 


HSGGRfYSSGWSPKWYYGMDV 


AF235787 


VH3-23*01 


tgtgcgaaaga 


tc 


aacctcgaaag 




gcagcagctggta 


AKDQPRKAAAG M YYYG M DV 


AF235574 


VH4-59*07 


tgtgcgaga 




cgaaat 




tattactatgatagtagtggt 


ARRNYYYDSSGPDAFDI 


AF235726 


VH 1-69*06 


tgtgcg 




gggagaggagagtat 




ggctatagcagcagctgg 


AGRGEYGYSSSWFDY 


AF235869 


VH2-70*10 


tgtgc 




cagaca 




atattgtggtggtgactgct 


ARQYCGGDCCSDY 


AF235809 


VH4-39*07 


tgtgcga 




caaaatc 


c 


gtattacgatattttgactggttatt 


ATKSVLRYFDWLLPSYYYYYGMDV 


AF235610 


VH3-30-3*01 


tgtgcgaga 




gatgaaag 




tagcagtggctgg 


ARDESSSGWYWYFDL 


AF235541 


VH3-48*03 


tgtgcgagaga 


tc 


gacgcgaccggat 




taactgggga 


AR D R R D R[N WG YYYG M DV 


AF235758 


VH2-70*01 


tgtgcacggata 




agggccctagacgta 




aactgggga 


ARIRALDVNWGGWYFDL 


AF235544 


VH3-66*01 


tgtgcgagaga 


tc 


gagac 




tacgatttttggagtggtt 


ARDRDYDFWSGYAFDI 


AF235692 


VH3-33*01 


tgtgcgagaga 




gggggagattgat 




catattgtggtggtgactgctatccc 


AREGEIDHIVVVTAIPNWFDP 


AF235764 


VH1-3*01 


tgtgcgagag 




cgaga 


ct 


aggatattgtagtggtggtagctgctactcc 


ARARLGYCSGGSCYSGGFDY 


AF235793 


VH1-69*02 


tgtgcgaga 




gatctcacttacgggc 




attttgactggtta 


A R D LTYG H FDWLPPHYYYYYGMDV 


AF235897 


VH3-21 01 


tgtgcgaga 




tcaacggcatca 




tacggtgactac 


ARSI ASYGDYDNWFDP 


AF235796 


VH3-30*03 


tgtgcgaaaga 


tc 


ctacgggaaccacaaacttatctcccttagggcg 




agcagcagct 


AKD PTGTTN LSPLG R AAAYVYYYYYG M DV 


AF235588 


VH4-59*08 


tgtgcga 




cccatcggat 




taactgggga 


ATHRINWGFDY 


AF235907 


VH5-51*01 


tgtg 




tgcgagacagctcg 




tacagctatggtt 


VRDSSYSYG LS N LYYYG M DV 


AF235842 


VH3-23*01 


tgtgcgaaaga 


t 


ttcccagacgagcccgg 




gtaccagctgctatac 


AKDFPDEPGYQLLYGSLDY 


AF235812 


VH5-a*01 


tgtgcgag 




ggccgaaatcttatccgg 




agcagtggc 


ARAEILSGAVAPRDY 


AF235657 


VH5-51*01 


tgtgcgagac 




gagaacaacc 




tgggacccact 


ARREQPGTHLNY 


AF235626 


VH3-21*01 


tgtggga 




aagaggacc 




ggagttatta 


GKEDRSYYDY 


AF235565 


VH3-23*01 


tgt 




accacagacccggccttgaggacctc 




actgctggggt 


TTDPALRTSLLGSFDY 



'The identified V H replacement footprints are underlined and highlighted in red in the A/7 regions. 

"The amino acids encoded by the identified V H replacement footprints are underlined in the amino acid sequences of the CDR3 regions. 
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replacement footprints motifs, 33.9 and 55.8% of IgH genes can be 
assigned as potential Vh replacement products (Table 1, bottom). 

Within the large number of IgH genes, there are 3818 non- 
functional IgH gene sequences and 687 of them contain the 5-mer 
Vh replacement footprint motifs in their Nl regions, which can 
be assigned as potential Vh replacement products. The frequency 
of Vh replacement products in non-functional IgH genes (18%) 
is extremely statistically significantly higher than that in the over- 
all functional IgH genes (p < 0.0001, two-tailed Chi square test 
with Yates' correction). Identification of Vh replacement prod- 
ucts in non-functional IgH genes fulfills the prediction that Vh 
replacement is a random process that can generate both func- 
tional and non-functional IgH rearrangement products. Taken 
together, these results uncovered a previously unrealized contribu- 
tion of Vh replacement products to the diversification of human 
IgH repertoire. 

DISTRIBUTION OF V H REPLACEMENT PRODUCTS IN IgH GENES USING 
DIFFERENT V H GENES 

Using the VhRFA program, we further analyzed the distribution 
of Vh replacement products in IgH genes using different Vh genes. 
The frequencies of Vh replacement products in IgH genes using 
different Vh germline genes are different (Figure 2). For exam- 
ples, the frequencies of Vh replacement products in IgH genes 
using the V H 2-5, V H 3-30> V H 3-30-3> V H i_69, and V H 3-34 genes are 
23.88, 19.12, 16.64, 14.28, and 13.13%, which are extremely statis- 
tically significantly higher than that in IgH genes using the Vh6-i 
gene (p < 0.0001, two-tailed Fisher's exact test) (Figure 2). As an 
internal control, 7.56% of IgH genes using the Vh6-i gene have 
5-mer Vh replacement footprints within their Nl regions, which 
is statistically significantly lower than that in the overall IgH gene 
sequences (p = 0.0004, two-tailed Fisher's exact test). 

V H REPLACEMENT PRODUCTS ARE HIGHLY ENRICHED IN IgH GENES 
DERIVED FROM PATIENTS WITH AUTOIMMUNE DISEASES OR VIRAL 
INFECTIONS 

The overall frequency of Vh replacement products in the 39,438 
unique IgH genes from the NCBI database (12.1%) is much 
higher than what was estimated in the 417 IgH genes obtained 
from healthy donors. We reasoned that the majority of IgH 
gene sequences deposited at the NCBI database was derived 



from diseased subjects, which may have higher frequencies of 
Vh replacement products. Next, we investigated the distribution 
of Vh replacement products in IgH genes derived from differ- 
ent disease sub-categories. Using the keyword analysis function 
within the VhRFA program, we can correlate the frequencies of 
Vh replacement products with different sub-categories of IgH 
gene sequences from the NCBI database. For examples, the fre- 
quency of Vh replacement products in 558 IgH genes derived 
from healthy donors is 8.6% (Figure 3), which is similar to 
the result obtained from previous analysis of the 417 IgH gene 
sequences from healthy donors. Interestingly, the frequencies of 
Vh replacement products in IgH genes derived from subjects with 
different autoimmune diseases, such as allergic rhinitis, RA, and 
SLE are statistically significantly higher than that in the healthy 
donors (Figure 3, p < 0.05, two-tailed Chi square test with Yates' 
correction; Table S4 in Supplementary Material). The frequencies 
of Vh replacement products are further enriched in IgH genes 
derived from RA synovium and in IgH genes encoding rheuma- 
toid factors, suggesting that B-cells expressing Vh replacement 
products are positively selected in the RA synovium to encode 
rheumatoid factors (Figure 3, p < 0.05, two-tailed Chi square test 
with Yates' correction; Table S4 in Supplementary Material). Sim- 
ilarly, Vh replacement products are highly enriched in IgH genes 
derived from SLE plasmablasts (Figure 3, p < 0.05, two-tailed 
Chi square test with Yates' correction; Table S4 in Supplementary 
Material), suggesting that these enriched Vh replacement products 
contribute to the production of autoAbs in SLE. 

The accumulation of Vh replacement in IgH genes derived 
from patients with different autoimmune diseases suggested that 
Vh replacement products may contribute to the production of 
autoAbs. Indeed, further analyses showed that Vh replacement 
products are statistically significantly enriched in IgH genes encod- 
ing rheumatoid factors, anti-Rh (D) Abs, and anti-acetylcholine 
receptor Abs (Figure 3, p < 0.05, two-tailed Chi square test with 
Yates' correction; Table S4 in Supplementary Material). 

To our surprise, the frequencies of Vh replacement products 
are significantly elevated in IgH genes derived from different 
viral infections. For examples, the frequencies of Vh replace- 
ment products in IgH genes derived from HIV and HCV infected 
patients are statistically significantly higher than that in healthy 
donors (Figure 3, p < 0.05, two-tailed Chi square test with Yates' 
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FIGURE 2 | Distribution of V H replacement products in IgH genes using 
different V H genes. The frequencies of V„ replacement products in functional 
IgH genes using each V H germline genes are compared with that in IgH genes 



using the V H6 -i gene. **p < 0.0001 , *p < 0.05. The result for IgH genes using 
the V He _, gene is highlighted in the box and the frequency of V„ replacement 
products in all the IgH genes is indicated by the dashed line. 
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FIGURE 3 | V„ replacement products are significantly enriched in IgH 
genes derived from autoimmune diseases or viral infections and in IgH 
genes encoding autoreactive or anti-viral Abs. Frequencies of V„ 
replacement products in IgH gene sequences derived from different 
sub-categories were analyzed based on the identification of pentamericV„ 
replacement footprints within their V-D junctions. The frequencies of V H 
replacement products in IgH genes derived from different autoimmune 
diseases and viral infections, or in IgH genes encoding auto Abs, anti-viral 
Abs, or anti-bacterial Abs were compared with that from healthy controls. 
The number of analyzed IgH gene sequences in each subcategory are 
indicated (n).The arrow head indicates the overall frequency of V H 
replacement products (12.1 %) in the 39,438 unique human IgH sequences. 
Statistical significance was determined using a two-tailed Chi square test 
with Yate's correction. *p < 0.05 is considered statistically significant and 
**p < 0.0001 is considered extremely statistically significant. 



correction; Table S4 in Supplementary Material). Further analy- 
ses showed that the Vh replacement products contribute to about 
30% of IgH genes encoding anti-HCV glycoprotein E2 Abs or anti- 
HBVsAg Abs. Such frequencies are statistically significantly higher 
than that in healthy donors (Figure 3, p < 0.05, two-tailed Chi 
square test with Yates' correction). Taken together, these results 
showed that Vh replacement products are highly enriched in IgH 
genes derived from patients with different autoimmune diseases 
and viral infections. 

V H REPLACEMENT ELONGATES THE IgH CDR3 

Vh replacement renews almost the entire Vh coding region. Due 
to the location of the cRSS site, a short stretch of nucleotides 
is remained as a Vh replacement footprint at the newly formed 
Vh-Dh junction after the Vh replacement process (37). Such 
Vh replacement footprints can contribute up to two amino acids 
into the IgH CDR3 to elongate the CDR3. The average CDR3 
length of the identified Vh replacement products is 18.2 ± 5.0 
aa (n = 4417), which is extremely statistically significantly longer 
than that of the non-Vn replacement products (15.4 ±4.4 aa, 
Figure 3, p < 0.0001, unpaired (--test) (Figure 4). This result 
confirmed that Vh replacement elongates the IgH CDR3 region. 

THE V H REPLACEMENT FOOTPRINTS PREFERENTIALLY ENCODE 
CHARGED AMINO ACIDS 

Our previous analysis showed that the Vh replacement footprints 
preferentially encoded charged amino acids in the IgH CDR3 



regions (37, 45). This is likely predetermined by the conserva- 
tion of amino acid sequence at the 3' ends of Vh germline genes. 
Here, analysis of the amino acids encoded by the identified pen- 
tameric Vh replacement footprints in the 4417 Vh replacement 
products showed that 57% of them are charged amino acids. 
Such frequency is extremely statistically significantly higher than 
that in the Nl regions of non-Vn replacement products (25%) 
(Figure 5A, p < 0.0001, two-tailed Chi square test with Yates' cor- 
rection). Detailed analyses showed that the frequencies of K, R, 
D, and E residues encoded by the Vh replacement footprints 
are statistically significantly higher than their usage in the Nl 
regions of non-Vn replacement products (Figure 5B, p < 0.05, 
two-tailed Chi square test with Yates' correction). These results 
confirmed our previous prediction that Vh replacement footprints 
preferentially contribute charged amino acids to the IgH CDR3 
regions. 

V H REPLACEMENT PRODUCTS ARE POSITIVELY SELECTED DURING 
AUTOIMMUNE OR ANTI-VIRAL RESPONSES 

Charged amino acids within IgH CDR3 are not well tolerated dur- 
ing Ab repertoire development, they are frequently found within 
the IgH CDR3 regions of autoreactive or anti-viral Abs, which 
may play important roles in binding charged self or viral anti- 
gens, respectively. Further analyses of Vh replacement products 
derived from different autoimmune diseases or viral infections 
showed that the identified Vh replacement footprints predomi- 
nantly encode charged amino acids (Figure 6A). Detailed analyses 
showed that the identified Vh replacement footprints in IgH genes 
encoding anti-DNA/histone Abs or rheumatoid factors encoded 
significantly lower frequencies of negatively charged residues, 
including D, E, N, and Q residues (Figure 6B, p < 0.05, two-tailed 
Chi square test with Yates' correction). 

The identified Vh replacement products have similar muta- 
tion rate when compared with the non-Vn replacement product 
derived from healthy donors, patients with autoimmune diseases 
or viral infections (Figure 6C). As negative controls, Vh replace- 
ment products or non-Vn replacement products in neonatal IgH 
gene sequences have much lower mutation rates (Figure 6C). 
The accumulation of mutations within these Vh replacement 
products indicates that these enriched Vh replacement products 
in autoimmune diseases or viral infections had been positively 
selected. 

DISCUSSION 

In order to determine the distribution of Vh replacement prod- 
ucts in these IgH genes and explore the biological significance of 
Vh replacement products in human antibody diversification and 
diseases, we developed a Java based computer program VhRFA 
to analyze large number of IgH gene sequences and to identify 
potential Vh replacement products (42). Previous analyses of the 
IgH gene repertoire have provided important insights regarding 
the developmental process and function of B lineage cells. Due 
to the tremendous diversity, the complete human IgH repertoire 
cannot be experimentally determined. Within the NCBI database, 
there are 61,851 human IgH gene sequences (May, 2012 version). 
The initial analysis of the Vh, Dh, and Jh gene usages in the 61,851 
human IgH gene sequences provides a comprehensive view of the 
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FIGURE 4 | The average CDR3 length of identified V H replacement 
products is significantly longer than that of non-V H replacement 
products. The distribution of IgH genes with different CDR3 lengths is 
shown in the bar graph. The average CDR3 length of V H replacement 



products (black bars) was compared to that of non-V H replacement 
products (white bars). Statistical significance was determined by using an 
unpaired f-test. **p < 0.0001 is considered extremely statistically 
significant. 
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FIGURE 5 | V„ replacement footprints preferentially contribute charged 
amino acids into the IgH CDR3 regions. (A) Frequencies of charged and 
uncharged amino acids (aa) in the N1 regions of non-V H replacement 
products were compared with those encoded by the V H replacement 
footprints. (B)The usages of different amino acids in the N1 regions of 
non-V„ replacement products (white bars) or encoded by the V„ 
replacement footprints (black bars) were analyzed and shown in the bar 
graph. The total number of amino acids analyzed for each population is 
indicated. Statistical significance was determined using a two-tailed Chi 
square test with Yate's correction. *p < 0.05 is considered statistically 
significant. **p < 0.0001 is considered extremely statistically significant. 



human IgH repertoire. In this dataset, the usage of every func- 
tional Vh germline gene was confirmed, although their usages 
differ dramatically. 



Using the VhRFA program, we identified Vh replacement prod- 
ucts and analyzed their distributions in the 39,438 unique IgH 
sequences. Based on the identification of pentameric Vh replace- 
ment footprint motifs within the Vh-Dh junctions, 12.1% of the 
IgH genes can be assigned as potential Vh replacement products. 
These results confirmed our previous estimation that Vh replace- 
ment products contribute to the diversification of the human IgH 
repertoire. Interestingly, the frequencies of Vh replacement prod- 
ucts inlgH genes using the V H 2-5, V H 3-30, V H 3-30-3, V H 3-49, Vhi-69, 
and VH3-34 are statistically significantly higher than that in the 
overall IgH genes. In contrast, the frequency of Vh replacement 
products in IgH genes using the Vh6-i gene is statistically sig- 
nificantly lower than that in the overall IgH genes. Among the 
non-functional IgH genes, 18% of them contain the pentameric 
Vh replacement footprints and can be assigned as potential Vh 
replacement products. These results confirmed the prediction that 
Vh replacement is a random process that can generate both func- 
tional and non-functional IgH rearrangements. Moreover, the 
high frequency of Vh replacement products in non-functional 
IgH genes suggested that Vh replacement products were nega- 
tively selected during B-cell development. Based on this reasoning, 
the frequency of Vh replacement products in the non-functional 
IgH genes may represent the true frequency of Vh replacement 
during early stages of B-cell development, because these non- 
functional IgH rearrangements cannot encode BCRs and had not 
been selected during B-cell development. 

Due to the location of the cRSS site, a short stretch of 
nucleotides has the potential to remain as a Vh replacement 
footprint at the Vh-Dh junctions following the Vh replacement 
process (25, 37, 46). The leftover Vh replacement footprints will 
elongate the IgH CDR3 regions (25, 37, 46). Analyses of the identi- 
fied 4788 Vh replacement products showed that the average CDR3 
length of the identified Vh replacement products is 2.8 aa longer 
than that of non-Vn replacement products. Previously, it surprised 
us that the identified Vh replacement footprints preferentially 
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FIGURE 6 | V H replacement footprints preferentially contribute charged 
amino acids into the CDR3 regions of IgH genes derived from 
autoimmune diseases and viral infections. (A) Frequencies of charged and 
uncharged amino acids encoded by the V H replacement footprints in IgH 
genes derived from autoimmune diseases and viral infections, n, Total number 
of amino acids analyzed in each subcategory. (B) Frequencies of negatively 



and positively charged residues encoded by the V H replacement footprints in 
IgH genes derived from autoimmune diseases or viral infections. Statistical 
significance was determined using a two-tailed Chi square test with Yate's 
correction. *p<0.05 is considered statistically significant. (C) Comparison of 
overall somatic mutation rates (%) within V„ region ofV H replacement 
products versus non-V H replacement products in different sub-categories. 



encode charged amino acids within the IgH CDR3 regions (22, 
37, 46). Recent analyses showed that the positions of the cRSS and 
high frequencies of charged amino acids encoded by the follow- 
ing nucleotides are highly conserved in IgH genes from different 
vertebrates (47). In the current study, 57% of the identified Vh 
replacement footprints encoded charged amino acids in the IgH 
CDR3 regions. Normally, charged amino acids within IgH CDR3 
are not well tolerated during antibody repertoire development, 
probably due to charged residues may generate autoAbs. Indeed, 
our analysis revealed that Vh replacement products are signifi- 
cantly enriched in IgH genes derived from patients with different 
autoimmune diseases, including RA, allergic rhinitis, and SLE or in 
IgH genes encoding different autoAbs such as rheumatoid factor, 
anti-rhesus D antigen, and anti-acetylcholine receptor Abs. Our 
recent analyses of large number of mouse IgH genes also showed 
that the frequencies of Vh replacement products are enriched 
in IgH genes derived from autoimmune prone mice (48). These 



results suggested that Vh replacement products contribute to the 
generation of autoantibodies in both human and mouse. 

Another important and interesting finding from this analysis 
of large number of IgH gene sequences is that the frequencies of 
Vh replacement products are significantly elevated in IgH genes 
derived from various viral infections, including HIV, HCV, and 
in IgH genes encoding Abs against HCV glycoprotein E2 or HBV 
surface antigens. Our recent studies showed that Vh replacement 
products are highly enriched in IgH genes encoding different sub- 
groups of anti-HIV antibodies, especially in CD4i and PGT anti- 
bodies (49). These results suggested that Vh replacement products 
may contribute to the generation of anti-viral Abs. The major- 
ity of the Vh replacement footprints identified from anti-viral 
Abs also encode charged amino acids, which may be important 
for binding charged viral antigens. Moreover, the accumulation 
of mutations in these Vh replacement products indicated that 
these enriched Vh replacement products in patients with viral 
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infections are positively selected during anti-viral responses. The 
identification of Vh replacement products in autoimmune dis- 
eases and anti-viral responses suggested a potential link between 
viral infections and the pathogenesis of autoimmune diseases. It 
has long been postulated that chronic viral infections contribute 
to autoimmunity. However, clear examples that Abs against viral 
antigens cross-react with self-antigens have only been found in 
a few cases (50, 51). Here, our results reveal a shared pattern of 
accumulation of Vh replacement products in IgH genes derived 
from autoimmune diseases and anti-viral responses. 

Vh replacement was originally proposed as a receptor editing 
mechanism to change unwanted IgH genes that are either non- 
functional or encoding autoreactive Abs. The enrichment of Vh 
replacement products in IgH genes derived from autoimmune dis- 
eases or encoding autoAbs is particular puzzling. There are several 
possible mechanisms to explain this finding. First, we have recently 
shown that crosslinking cell surface BCRs induces Vh replacement 
in human immature B-cells (40). Thus, the levels of Vh replace- 
ment recombination might be induced in the immature B-cells 
during either the anti-viral immune response or autoimmune 
disease due to persistent antigen stimulation or chronic inflam- 
mation. In supporting of this assumption, the number of newly 
emigrated immature B-cells in the peripheral blood is increased 
during inflammatory response; and these mobilized immature B- 
cells may continue to undergo Vh replacement recombination 
ectopically. Second, the intrinsic feature of Vh replacement is elon- 
gating the IgH CDR3 with charged amino acid. Vh replacement 
products may frequently encode autoAbs and they are efficiently 
deleted during normal B-cell development. The observed elevated 
frequencies of Vh replacement products in different autoimmune 
diseases may reflect the defective negative selection in these dis- 
eased subjects. Moreover, ectopically occurred Vh replacement 
may bypass the stringent negative selection in the bone marrow 
and release Vh replacement products in the periphery. Last, due 
to the special features of Vh replacement products in generat- 
ing IgH genes with long and charged CDR3, it is possible that 
Vh replacement products are positively selected by viral antigens 
during anti-viral responses to produce specific anti-viral Abs. In 
supporting of this notion, the identified potential Vh replace- 
ment products encoding anti-HIV antibodies all have very long 
CDR3 regions with multiple charged amino acid residues (49). 
The accumulated mutations within the Vh genes of the identi- 
fied Vh replacement products in the current study also indicated 
the positive selection. However, the leftover Vh replacement prod- 
ucts generated during a chronic viral infection may encode Abs 
that cross-react with self-antigens and later contribute to autoim- 
munity. In fact, many cell surface antigens and viral antigens are 
negatively charged, which may be a reason for the selection of Vh 
replacement products with long and charged CDR3 regions. 

In our sequence based analysis, the assignment of Vh replace- 
ment is dependent on the identification of Vh replacement foot- 
prints within the Vh-Dh junctions. Any deletion at the 3' of Vh 
genes or the 5' of Vh replacement footprint motifs during the 
primary or secondary IgH gene recombination, respectively, may 
destroy the pentameric Vh replacement footprints. Therefore, it 
is possible that the sequence analysis based study still under- 
estimates the frequency of Vh replacement products. Using the 



VhRFA program, we extended our analysis our Vh replacement 
products to include potential Vh replacement footprint motifs 
with different lengths. For examples, 33.9% of the IgH genes con- 
tain the tetrameric Vh replacement footprint motifs and 58.8% of 
IgH genes contain the trimeric Vh replacement footprint motifs. 
These results revealed a significant contribution of Vh replacement 
products to the IgH repertoire. Recent studies in mice carrying 
non-functional IgH genes on both IgH alleles demonstrated that 
Vh replacement occurs efficiently to generate almost normal num- 
bers of B-cells with diversified IgH repertoires (52). However, only 
about 20% of the IgH gene sequences from this study contained 
residual Vh replacement footprints. Therefore, the majority IgH 
genes generated through Vh replacement recombination have no 
leftover Vh replacement footprints. Theoretically, 66% of IgH 
rearrangements will be out of reading frame and 44% of devel- 
oping B-cells may carry non-functional IgH rearrangements on 
both alleles. If all of these B-cells are rescued by Vh replacement, 
a minimum of 44% of the IgH genes might be generated through 
Vh replacement recombination. Under this assumption, IgH genes 
containing the tetrameric or the trimeric Vh replacement foot- 
print motifs at their Nl regions should also be considered as 
potential Vh replacement products. 

Like any sequence based analysis program, the Vh RFA program 
also has its limitation. Although sequence motifs assemble the Vh 
gene 3' ending sequences can be identified in the Nl regions, such 
motifs can also be identified within the N2 regions at relative 
lower frequencies. Theoretically, Vh replacement can only leave 
footprint within the Nl region; the existence of Vh replacement 
footprint like motifs within the N2 regions can only be generated 
by random nucleotide addition. For IgH genes using the Vh6-i 
gene, which is the first Vh germline gene 5' to the DH locus, there 
should have no Vh replacement footprint like motifs within the 
Vh-Dh junctions, but the VhRFA program still identifies 7.56% 
of the sequences contains Vh replacement footprint like motifs 
within the Vh-Dh junctions. We can only refer such motifs as the 
contribution of random nucleotide addition. 

In summary, analyses of a large number of human IgH gene 
sequences from the NCBI database uncovered a significant con- 
tribution of Vh replacement products to human Ab repertoire, 
especially in IgH genes derived from autoimmune diseases or 
anti-viral responses. Understanding how Vh replacement is reg- 
ulated and how Vh replacement products are positively or neg- 
atively selected during normal or diseased conditions will be the 
focus of future studies, because modulation of the level of Vh 
replacement may offer unique approaches to treat different human 
diseases. 
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